Torque Transfer
Contains code for Torque Transfer, USC CSCI 599 Spring 2020.
Github link: https://github.com/csci-599-applied-ml-for-games/Torque-Transfer-
Objective
The main objective is to apply Transfer Learning using Reinforcement Learning at the core.
The goal is to demonstrate how Transfer Learning can be used across simulators and eventually in a real life environment. Using knowledge gained in one self driving environment, we train the same agent on a similar environment and we compare the learning curves for agent with transfer learning and that without.
Methodology
TORCS:
- Sensor Input: MLP as a policy network using PPO that uses sensor state from the input in order to generate actions (such as throttle and steering). Uses curriculum learning from simple tasks to more complex ones.
- Image Input: Navigates using image result as input to a CNN, with only steering as the action. PPO is most effective but tends to prioritize “hacking” the reward by avoiding long steps and using shortcuts.
- Imitation Learning: Interfaces sensor input to image input to avoid shortcuts and predict the driver action using a CNN. Error function is MSE, masters game after 50 epochs.
As the Image forward is same as Image backward. This results in state space aliasing. PPO critic only takes state as input, hence gets confused while predicting advantage. This causes fall in learning curve.
Donkey Car Sim:
- Image Input: The image dimesion is kept the same as TORCS and a similar track is also chosen to provide similar environment for training.
- Ground up training: One model was trained entirely from ground up with the model architecture as described before.
- Transfer Learning based training: Another model was trained using a transfer learning based agent where the agent was pre-trained on TORCS. The transfer learning was implemented by replacing just the final Dense layer and allowing all layers to be trainable.
Results
The agent which used Transfer Learning from TORCS to Donkey Car Simulator performed better in the following ways:
- Has a higher average reward.
- Takes fewer episodes to train.
- Has better stability in when driving in test mode.
The following video shows the demo of our project with explanation:
Here is the link to the paper:
https://drive.google.com/file/d/1szvbnzQd4vPkF-I6dM8p47iKH5mu0oIF/view?usp=sharing
Here is the link to the presentation:
https://drive.google.com/file/d/1JZsqM1Tf6chaGBK5BR000iyisoCKoxu6
Contributors
Shashank Hegde - https://www.linkedin.com/in/karkala-shashank-hegde/
Sriram Ramaswamy - https://www.linkedin.com/in/sriramvera/
Sumeet Bachani - https://www.linkedin.com/in/sumeetbachani/
Tushar Kumar - https://www.linkedin.com/in/tushartk/